Automatic extraction of property norm-like data from large text corpora

نویسنده

Colin Kelly

چکیده

Traditional methods for deriving property-based representations of concepts from text have focused on either extracting only a subset of possible relation types, such as hyponymy/hypernymy (e.g., car is-a vehicle) or meronymy/metonymy (e.g., car has wheels), or unspecified relations (e.g., car--petrol). We propose a system for the challenging task of automatic, large-scale acquisition of unconstrained, human-like property norms from large text corpora, and discuss the theoretical implications of such a system. We employ syntactic, semantic, and encyclopedic information to guide our extraction, yielding concept-relation-feature triples (e.g., car be fast, car require petrol, car cause pollution), which approximate property-based conceptual representations. Our novel method extracts candidate triples from parsed corpora (Wikipedia and the British National Corpus) using syntactically and grammatically motivated rules, then reweights triples with a linear combination of their frequency and four statistical metrics. We assess our system output in three ways: lexical comparison with norms derived from human-generated property norm data, direct evaluation by four human judges, and a semantic distance comparison with both WordNet similarity data and human-judged concept similarity ratings. Our system offers a viable and performant method of plausible triple extraction: Our lexical comparison shows comparable performance to the current state-of-the-art, while subsequent evaluations exhibit the human-like character of our generated properties.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic extraction of property norm-like features from large text corpora with gold standard, human and semantic-similarity evaluations

Property norms (e.g., banana is yellow, aeroplane has wings) play a key role in cognitive science, forming the basis for many recent theoretical accounts of conceptual representations (e.g., Cree et al., 2006; Grondin et al., 2009; Randall et al., 2004). Such norms are typically derived from norming studies where a large number of human participants elicit properties for a set of concepts (e.g....

متن کامل

Large-Scale Acquisition of Feature-Based Conceptual Representations from Textual Corpora

Methods for estimating people’s conceptual knowledge have the potential to be very useful to theoretical research on conceptual semantics. Traditionally, feature-based conceptual representations have been estimated using property norm data; however, computational techniques have the potential to build such representations automatically. The automatic acquisition of feature-based conceptual repr...

متن کامل

Vision and Feature Norms: Improving automatic feature norm learning through cross-modal maps

Property norms have the potential to aid a wide range of semantic tasks, provided that they can be obtained for large numbers of concepts. Recent work has focused on text as the main source of information for automatic property extraction. In this paper we examine property norm prediction from visual, rather than textual, data, using cross-modal maps learnt between property norm and visual spac...

متن کامل

Using Decision Trees and Text Mining Techniques for Extending Taxonomies

Lexical taxonomies have tree-like structures and can thus be extended to become decision trees that serve for their own extension. In this paper, a semi-automatic procedure for extending lexical taxonomies is proposed that makes use of term extraction methods for identifying new concepts and that uses cooccurrence data from large corpora to generate the necessary features (semantic descriptions...

متن کامل

استخراج پیکره‌ موازی از اسناد قابل‌مقایسه برای بهبود کیفیت ترجمه در سیستم‌های ترجمه ماشینی

Data used for training statistical machine translation method are usually prepared from three resources: parallel, non-parallel and comparable text corpora. Parallel corpora are an ideal resource for translation but due to lack of these kinds of texts, non-parallel and comparable corpora are used either for parallel text extraction. Most of existing methods for exploiting comparable corpora loo...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Cognitive science

دوره 38 4 شماره

صفحات -

تاریخ انتشار 2013

Automatic extraction of property norm-like data from large text corpora

نویسنده

چکیده

منابع مشابه

Automatic extraction of property norm-like features from large text corpora with gold standard, human and semantic-similarity evaluations

Large-Scale Acquisition of Feature-Based Conceptual Representations from Textual Corpora

Vision and Feature Norms: Improving automatic feature norm learning through cross-modal maps

Using Decision Trees and Text Mining Techniques for Extending Taxonomies

استخراج پیکره‌ موازی از اسناد قابل‌مقایسه برای بهبود کیفیت ترجمه در سیستم‌های ترجمه ماشینی

عنوان ژورنال:

اشتراک گذاری